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Abstract 

Background: Ticks are blood-sucking arthropods and a primary function of tick salivary proteins is to counteract the 
host's immune response. Tick salivary Kunitz-domain proteins perform multiple functions within the feeding lesion and 
have been classified as venoms; thereby, constituting them as one of the important elements in the arms race with the 
host. The two main mechanisms advocated to explain the functional heterogeneity of tick salivary Kunitz-domain 
proteins are gene sharing and gene duplication. Both do not, however, elucidate the evolution of the Kunitz family in 
ticks from a structural dynamic point of view. The Red Queen hypothesis offers a fruitful theoretical framework to give a 
dynamic explanation for host-parasite interactions. Using the recent salivary gland Ixodes ricinus transcriptome we 
analyze, for the first time, single Kunitz-domain encoding transcripts by means of computational, structural bioinformatics 
and phylogenetic approaches to improve our understanding of the structural evolution of this important multigenic 
protein family. 

Results: Organizing the /. ricinus single Kunitz-domain peptides based on their cysteine motif allowed us to specify a 
putative target and to relate this target specificity to lllumina transcript reads during tick feeding. We observe that several 
of these Kunitz peptide groups vary in their translated amino acid sequence, secondary structure, antigenicity, and 
intrinsic disorder, and that the majority of these groups are subject to a purifying (negative) selection. We finalize by 
describing the evolution and emergence of these Kunitz peptides. The overall interpretation of our analyses discloses a 
rapidly emerging Kunitz group with a distinct disulfide bond pattern from the /. ricinus salivary gland transcriptome. 

Conclusions: We propose a model to explain the structural and functional evolution of tick salivary Kunitz peptides 
that we call target-oriented evolution. Our study reveals that combining analytical approaches (transcriptomes, 
computational, bioinformatics and phylogenetics) improves our understanding of the biological functions of important 
salivary gland mediators during tick feeding. 

Keywords: Red Queen hypothesis, Tick saliva, Kunitz-domain proteins, Cysteine motif, Structural bioinformatics, 
Antigenicity, Protein disorder, Molecular clock, Evolution 



Background 

Ticks have been successful as ectoparasites due to mor- 
phological and physiological adaptations [1]. As obligate 
hematophagous (blood-sucking) arthropods, ticks trans- 
mit various bacterial and viral diseases, e.g., babesiosis, 
theileriosis, anaplasmosis, Lyme disease and tick-borne 
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encephalitis, and, thus greatly impact human and animal 
health [2]. To combat host defense mechanisms, ticks 
possess powerful pharmacological proteins in their saliva 
that are injected into the vertebrate host during blood 
feeding [3,4]. Our current understanding of host-parasite 
interactions has been greatly impacted by the revolution 
in sequencing technologies that has provided a massive 
amount of data previously unimaginable [5]. Among the 
various transcriptome studies to date, transcriptome 
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projects from arthropod disease vectors are using Sanger 
or next generation sequencing techniques. These include 
several salivary gland (SG) transcriptome and proteome 
projects (also known as sialomes) from hard tick [6-18] 
and soft tick species [19-22]. Hundreds of novel SG 
transcripts that putatively encode proteins were discov- 
ered from these sialome projects, thus elucidating how 
ticks may complete a blood meal while providing insight 
towards their evolutionary expansion [23] . 

Available SG transcriptomes show that the most fre- 
quently secreted tick salivary protein families are lipoca- 
lins, enzymes, and protease inhibitors. Among protease 
inhibitors, Kunitz-domain transcripts are one of the most 
abundant protein families in tick SGs [9,18,19,21]. The 
archetypal Kunitz fold is highly conserved, resembling the 
first Kunitz-domain protein, the bovine pancreatic trypsin 
inhibitor (BPTI) that was functionally described in 1936 
by Moses Kunitz [24]. To date, about 15 single Kunitz- 
domain peptides from ticks have been functionally charac- 
terized. Classical serine protease inhibitors were analyzed 
from the hard ticks Rhipicephalus microplus (BMCL) [25], 
Rhipicephalus appendiculatus (TdPI) [26], Rhipicephalus 
haemaphysaloides (Rhipilin-1) [27], Haemaphysalis longicor- 
nis (HlChl, HIMKI and Haemangin) [28-30], Amblyomma 
cajennense (Amblyomin-X) [31], and Ixodes scapularis 
(Tryptogalinin) [32]. Protease inhibitors were also character- 
ized from the soft ticks Ornithodoros moubata (TAP) [33] 
and Ornithodoros savignyi (FXal) that mainly function as 
anti-clotting agents. Anti-platelet inhibitors were also identi- 
fied as single Kunitz-domain inhibitors, such as the Mono- 
grins (1A and IB) from Argas monolakensis [34], and, their 
orthologs Disagregin [35] and Savignygrin from Ornitho- 
doros spp. [36]. Several tick salivary Kunitz-domain proteins 
that possess multiple domains (1-7 Kunitz-domains) were 
also characterized as serine protease inhibitors [37-39]. Of 
all the tick SGs Kunitz-domain proteins, however, single 
Kunitz-domain peptides are highly represented (we will refer 
to these single domains as Kunitz peptides, henceforth) 
[9,12]. These Kunitz peptides vary in their cysteine (Cys) 
motifs (possessing more or less than 6 Cys residues) with 
some lacking the archetypal disulfide bonds causing a more 
flexible fold, therefore diversifying their inhibitory activity 
[26,32,40]. 

Kunitz SG peptides of I. scapularis were phylogenetically 
analyzed to uncover their emergence in ticks and the 
expression trends of these Kunitz peptides were also sta- 
tistically analyzed [23]. These Kunitz peptides were cate- 
gorized in three different groups (groups I, II and III) 
based on their Cys motif. The I. scapularis Kunitz peptides 
belonging to group I were suggested to represent the an- 
cestor of all tick Kunitz-domain family (single and mul- 
tiple domains). Several Kunitz peptides seemed to have 
lost their ability to function as serine protease inhibitors 
and instead to block and/or modulate ion channels, 



possibly related to the ticks necessity for prolonged feed- 
ing on the vertebrate host [23]. The authors are aware of 
only one study that functionally and structurally character- 
ized a tick Kunitz peptide as an ion channel effector, the 
maxiK channel modulator Ra-KLP from R. appendiculatus 
[40]. This paucity must be remedied since Dai et al. [23] 
putatively characterize the majority of I. scapularis Kunitz 
peptides as ion channel blockers/modulators. Further- 
more, Fry et al. [41] have argued that hematophagous se- 
creted proteins, such as of the Kunitz family, should be 
classified as venomous. Most classified toxins are stabi- 
lized by their disulfide bridges and once these toxins be- 
come functionally essential as a venom, their adaptation is 
often reinforced by gene duplication [41]. Gene sharing 
and gene duplication are the main mechanisms advocated 
to explain the functional heterogeneity of tick salivary 
Kunitz family proteins [23,42] . 

In our study, we used computational, structural bio- 
informatics and phylogenetic methods to reevaluate tick 
salivary Kunitz peptides from a more in-depth structural 
point of view by analyzing the functional, antigenic, and 
evolutionary characteristics of Kunitz peptides from the 
recently annotated Ixodes ricinus SG transcriptome [18]; 
GenBank Bioproject PRJNA177622. Compared to clas- 
sical biochemical analyses and classical Sanger sequen- 
cing techniques that revealed only a few thousands of 
sequences from tick transcriptome studies presented 
until today (with the exception of the A. maculatum 
transcriptome), the large amount of available data of the 
454 pyrosequencing/Illumina /. ricinus SG transcriptome 
makes it feasible to thoroughly analyze multigenic pro- 
tein families (i.e., the Kunitz-domain family). In the /. 
ricinus transcriptome, 4% of all Illumina reads were 
classified as transcripts encoding for Kunitz-domain pro- 
teins and of these, 1.4% accounted for single Kunitz- 
domain transcripts (that putatively encode archetypal 
Kunitz peptides possessing more or less 6 Cys residues) 
[18]. Our approach demonstrates how to interpret next 
generation transcriptome data to expand our under- 
standing of the molecular, structural and functional evo- 
lution of hematophagy in ticks. Our overall analysis 
developed a novel concept that elucidates the divergence 
and adaptation of I. ricinus salivary Kunitz peptides that 
we call target-oriented evolution. 

Results and discussion 

Organizing the different /. ricinus SG Kunitz peptides based 
on their Cys motifs facilitates functional predictions 

From the 454/Illumina SG I. ricinus transcriptome [18], 
we found archetypal Kunitz peptides containing 6 Cys res- 
idues and variants containing an additional 7 th Cys residue 
adjacent to the conserved 5 th Cys residue (Figure 1). Based 
on their Cys motif we were able to organize these tick sal- 
ivary Kunitz peptides into 11 groups (Gl-Gll; Figure 1A) 
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Figure 1 /. ricinus Kunitz groups based on Cys motif. Kunitz peptides from the annotated transcriptome of /. nanus were organized into 1 1 
groups (G1-G1 1) based on their Cys motif (C = Cys and X = intra-Cys residues). The Cys motif for each group is depicted in Panel A along with 
their predicted disulfide bond patterns (in brackets). The alignment of one representative for each group (B) shows the conserved Cys residues 
(red), the positions of aa variability (yellow), and the key residues that interact with serine proteases (blue) and ion channels (green). The box 
around G3, G6 and G1 1 indicates that they do not contain all of the standard residues for serine protease (blue) or ion channel (green) 
interactions. Two separate alignments in Panel C depict the key conserved residues for serine protease inhibition (blue) using BPTI (UniProt: 
P00974) and ion channel blockage (green) using beta-bungarotoxin (UniProt: P00989). Residues highlighted in black are conserved. 
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and a group with <4 Cys in their amino acid (aa) 
sequences - we will refer to these peptides as simple 
Kunitz (SK), hereafter. The Pfam server (http://pfam. 
sanger.ac.uk/) verified that all 11 groups plus the SKs pos- 
sess the conserved single Kunitz-domain. Group G6 is the 
only /. ricinus Kunitz peptide group that possesses a differ- 
ent disulfide bond pattern since it lacks the archetypal 
Kunitz/BPTI Cys2 and Cys4 disulfide bridge (Figure 1A). 
The 3D modeled structures of all group representatives 
from Figure 1 show that the overall archetypal tertiary 
Kunitz fold is highly conserved regardless of the deviated 
disulfide bridge pattern for G6 (Additional file 1A-B). Fur- 
thermore, the tertiary Kunitz fold for each group represen- 
tative does not structurally deviate much from each other 
(Additional file 1C and Additional file 2). The deviation 
from the archetypal Cys motif (as in G6), however, has 
been shown to be functionally flexible [26,32,40]. 

Aligning a representative of each /. ricinus Kunitz pep- 
tide group (Figure IB) in our study shows that specific res- 
idues coincide with the putative functions described by 
Dai et al. [23], classified as serine protease inhibitors or 
ion channel blockers/modulators. Groups G3, G6 and 
Gil, however, do not contain these conserved residues 
(boxed in Figure IB). With the exception of the annotated 
I. scapularis sialome [6,9], both protein BLAST [43] and 
PSI-BLAST [44] searches do not reveal any known or 
functionally described single Kunitz-domain peptide with 
the Cys motif of G3, G6 and Gil. The proteins that 
slightly resemble the Cys motif of G6, and lack the arche- 
typal Cys2 and Cys4 disulfide bridge, are the I. scapularis 
salivary Kunitz proteins Ixolaris [38] and Penthalaris [39], 
both possessing multiple Kunitz domains, and some mem- 
bers of the Kunitz family found in R. microplus [45]. An 
alignment using the classical Kunitz serine protease inhibi- 
tor BPTI (UniProt: P00974) with Gl and G7 shows the con- 
served positively charged arginine (Arg or R) or lysine (Lys 
or K), known as the PI site that interacts with the negatively 
charged active site of serine proteases (highlighted blue in 
Figure 1C). Furthermore, the key residues for the ion chan- 
nel blocker beta-bungarotoxin (UniProt: P00989), from the 
venomous snake Bungarus multicinctus [46], are also con- 
served in G2, G4, G5 and G8-G10 (highlighted green in 
Figure 1C). 

By organizing the Kunitz peptides from the I. ricinus 
transcriptome into specific groups, we were also able to 
visualize specific profiles occurring in the transcript reads 
among the four independent cDNA libraries, suggesting 
that there may be a correlation between the average tran- 
script read value and putative function for each group. 
The three groups with the highest number of transcripts 
were Gl, G2 and G6 (G6 being the largest group). 
Figures 2A-C depicts the average Illumina reads for all 
groups obtained from the L ricinus transcriptome, comple- 
menting our functional predictions. We notice that three 



profiles are occurring among the organized transcripts en- 
coding for Kunitz peptides. Figure 2A (Gl and G7) shows 
that the average transcript read level (Kunitz encoding tran- 
scripts that we classify as putative protease inhibitors) is 
low at the early stages of tick feeding for both nymphs (EN: 
6 h to 24 h of feeding on the host) and adults (EA: 6 h to 
2 days of feeding on the host) compared with the increased 
transcript reads during late feeding (LN: 2-4 days of feed- 
ing and LA: 3-7 days of feeding). Soft ticks have a shorter 
feeding cycle than hard ticks and this phenomenon has 
been thoroughly explained by comparing the evolutionary 
divergence and emergence of serine protease inhibitors in 
hard ticks and soft ticks (about 120 and 92 MYA) [42]. Giv- 
ing that hard ticks (like L ricinus) feed on blood for longer 
periods of time (females: >1 week) and that serine proteases 
are involved in blood coagulation and immune responses, 
we expect a higher transcript read encoding for serine pro- 
tease inhibitors during late feeding stages as depicted in 
Figure 2 A (for both nymphal and adult life cycles). 

We putatively classify G2, G4, G5 and G8-G10 as ion 
channel blockers and/or modulators (Figure 1C) and as 
Figure 2B shows they exhibit a separate profile than the 
serine protease inhibitors of Figure 2A (Gl and G7). These 
putative ion channel blockers/modulators show lower 
transcript reads during the nymphal stages of feeding 
(both early, EN, and late, LN), but the reads are increased 
in adult ticks during the early stage of blood feeding (EA). 
The profile then decreases during the late stages of feeding 
(Figure 2B). A possibility, as noted in mayfly nymphs SGs, 
for the expression of ion channel blockers/modulators in 
I. ricinus nymphs may partially be due to maintaining 
osmoregulatory function during molting [47]. To resist 
desiccation, tick nymphs rely on both water vapor and 
water retention, while adult ticks only rely on enhanced 
water retention [48]; therefore, this may explain the in- 
creased transcript read encoding for ion channel blockers/ 
modulators during adult feeding stages (compared with 
nymphs - Figure 2B). In Figure 2C we show that G6 devi- 
ates from the other two profiles, however, G3 and Gil 
have similar profiles to that of ion channel blockers/mod- 
ulators (Figure 2B). The distinct profile and Cys motif G6 
displays suggests that this group may belong to (a) a com- 
bination of serine protease inhibitors and ion channel 
blockers/modulators, (b) either serine protease inhibitors or 
ion channel blockers/modulators, or, (c) a completely new, 
undefined function - i.e., gene sharing (moonlighting). 

The /. ricinus SG Kunitz groups possess intra-Cys residue 
variability, assemble into various secondary structure 
clusters, and vary in antigenicity and protein disorder 

The aa positions with high variability (highlighted yellow 
in Figure IB) for the intra-Cys residues of each group 
were calculated by the Protein Variability Server [49,50] 
using the Shannon entropy equation (see Methods) [51] 
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Figure 2 Average lllumina reads of nymphal and female /. ricinus Kunitz-domain transcripts. Each Cys motif group (G1-G1 1) was graphed 
to depict their overall distribution of their average lllumina reads (y-axis) sequenced from four independent /. ricinus SG cDNA libraries (x-axis) - 
two nymphal stages (EN and LN) and two adult stages (EA and LA). For a detailed account on the specifics of these /. ricinus cDNA libraries see 
Schwarz et al. 2013 [18]. Panel A represents putative serine protease inhibitors (G1 and G7) with a distinct lllumina read profile in nymphs and 
female adult SGs of their encoding transcripts compared with the encoding putative ion channel blockers/modulators of panel B (G2, G4, G5 and 
G8-10). Panel C depicts that G3 and G1 1 share a similar profile to ion channel blockers/modulators (B) and that Kunitz-domain transcripts of G6 
show a third profile in ticks completely distinct from all other groups. Standard deviations are represented as error bars. 



are represented in Figures 3A and D as the ratio of 
highly variable/highly conserved sites (V/C) and the 
average of aa variability (AVE) per group, respectively. 
On one hand, we consider G3, G5, G7, G10 and Gil are 
"frozen" groups having low aa variability and high pro- 
portion of highly conserved sites. On the other hand, 
Gl, G2 and G6 are "melted" groups or highly variable. 
Due to their low aa variability, the former assemblage 
may be considered highly adapted in the host-parasite 
interaction from a functional and/or immunological 
point of view. The second assemblage may still be under 
evolutive pressures that force aa variability. 

To demonstrate the structural outcome of these "frozen" 
and "melted" clusters, we predicted the secondary struc- 
ture for each Kunitz peptide in all groups. We show that 
22 models depict the variability of secondary structure 
among the 145 Kunitz peptides (excluding SK peptides) 
(Additional file 3). Five models in Additional file 3 (Repre- 
sentative models Gl: Ir2:98; Gl: Ir2:4878; Gl: Ir2: 19262; 



G6: Ir2:1982 and G7: Ir2: 4792) are combined (3-strand 
and a-helical secondary structure models, differing in 
length, amount and order of a-helices and (3-strand, and 
describe 81.6% of the secondary structure variation 
present in the /. ricinus Kunitz/BPTI family. All the groups 
contain different amounts of secondary structure models 
(Figure 3B and Additional file 3). It seems, however, that 
the secondary structure presented in lr2-4878 and lr2-98 
of Gl is the structural skeleton for the remaining second- 
ary structures; these structures are also present in 10 of 
the 11 groups and explain 50% of the secondary struc- 
tures. The fact that Gl represents the structural skeleton 
may be since its Cys motif (as displayed in Figure 1A) is 
considered the ancestor of Kunitz peptides in /. scapularis 
[20]. Structural convergence, however, has been reported 
for venoms, a general classification that the Kunitz protein 
family belongs to [Revised in 41]. 

When analyzing the aa variability (V/C or AVE) in re- 
lation to the secondary structure models per group 
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Figure 3 Graphical representations of amino acid and secondary structure variability. The relation between highly variable and low 

variable sites (V/C) among the groups is shown (A) The number of secondary structure models per groups is also presented (B). Correlation 

between the average of aa variability (AVE) and the ratio V/C (C). Average of aa variability estimated by Shannon entropy (D) and the correlation 

between the ratio V/C and 2D models, without and with group 6 (E-F). 
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(excluding G6) a high correlation is given (R 2 = 078; 
Figure 3E); however, by including G6 in this analysis the 
correlation decreases (R 2 = 0.18; Figure 3F). This dictates 
that the aa variability per group is partly justifiable for 
structural reasons (functionality), except in G6. Figure 3C 
shows that ratio V/C or AVE of aa variability are corre- 
lated, therefore, using V/C or AVE of aa variability is 
interchangeable for Figure 3E and F. The presence of 
different folds (i.e., secondary structures) suggests that 
gene sharing events may have occurred throughout the 
evolution of I. ricinus and may cause additional and 
unique functional properties. Such moonlighting pro- 
teins may explain the diverse inhibitory effect of tick sal- 
ivary Kunitz peptides. 

To elucidate additional molecular properties, we ana- 
lyzed both the antigenic properties and protein disorder 
for all I. ricinus Kunitz peptide groups in our study. Our 
analysis reveals that the regional disorder was located at 
both termini for the majority of peptides (data not 
shown). It is worth noting that antigenic properties 
involve the mobility of the C- and N- termini [52], In 
Figure 4A we show that G6 has the highest antigenicity 
mean (Vaxijen score). G6 has a "misplaced" Cys2 (arche- 
typal) and lack of this Cys2 is responsible for the func- 
tional flexibility of the Kunitz-domain binding loop, that 
may permit target-specific interactions other than the 
catalytic domain of serine proteases [53]. We also notice 
that groups with only two sequences vary in antigenicity 
(Figure 4A) that also coincides with differences in sec- 
ondary structure (Additional file 3). Functional flexibility 
(or mobility) is also demonstrated in G6, since it has a 
higher variability in intrinsic protein disorder (Figure 4B). 



Intrinsically disordered regions (or proteins) increase 
molecular recognition because of an ability to fold differ- 
ently upon binding and possess large interacting surfaces 
[54]. Interestingly, G6 has one sequence that is com- 
pletely disordered (Ir2-10518) that is also highly anti- 
genic (>0.7). 

The I. ricinus SG Kunitz transcripts are under different 
selective pressures 

The Red Queen hypothesis was proposed by Van Valen 
in 1973 [57] to explain the molecular evolution through 
evolutive interactions of prey-predator or host-parasite. 
The hypothesis establishes "that a proportional amount 
of successful response by one species produces a total 
negative effect on other species[..] To maintain itself as 
before, the [other] species must increase its fitness [in a 
proportional amount]". The /. ricinus SG Kunitz encod- 
ing transcripts may adhere to the Red Queen hypothesis, 
where inverse selection (positive or negative) is a series 
of one species adapting (the tick) and the other counter- 
adapting (the host). To elucidate or validate the type of se- 
lective pressure (positive or purifying) each I. ricinus Kunitz 
peptide group has undergone throughout its evolution we 
used the Synonymous Non-synonymous Analysis Program 
(SNAP) (http://www.hiv.lanl.gov) and Datamonkey servers 
[58] (see Methods for further details). 

Assuming that a ratio >1 between the non-synonymous 
nucleotide substitutions per site (d N ) with the synonym- 
ous substitutions (d s ) is due to a positive selection and a 
ratio <1 is due to a purifying selection, we note in Table 1 
that the majority of groups are under a purifying selec- 
tion. As we depict in Figure 4C, only G4 had a strong 
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Figure 4 Antigenicity and intrinsic protein disorder. Variations in antigenicity (A) for each group as predicted by the VaxiJen server [55] that 
uses an alignment-independent algorithm based on the physicochemical properties of proteins [55]. Ratio of disordered compared to the total 
sequence (B) for each group as predicted by the MetaDisorder server that builds a consensus from 13 protein disorder predictors [56]. Panel C 
shows the correlation for G4 between antigenicity (x-axis) and intrinsic protein disorder (y-axis). 



correlation between disorder and antigenicity; coinciden- 
tally, G4 is the only group that has undergone positive se- 
lection (Table 1). This correlation (Figure 4C) suggests 
that the disorder in G4 may be under positive selection 
due to the immune system of the host, since protein mo- 
bility (disorder) may influence the antigenic properties of 
proteins [52]. Our SNAP analysis, however, only displays 
the average of synonymous and the non-synonymous nu- 
cleotide substitutions per site, therefore, is not to say that 
selection per site was not positive or negative for specific 
members of the remaining groups. We therefore used 
Datamonkey to verify our SNAP analyses for evidence of 
positive selection and to determine which site along the 
nucleotide sequence are undergoing positive (natural) or 
purifying (negative) selection. 

Overall, the ratio d N ld s from Datamonkey slightly dif- 
fer from that of SNAP. The Datamonkey analyses show 
that G2, G4 and G6 possess evidence for positive selec- 
tion and note the type of selection for each site, using 



the SLAC algorithm (see Methods). Figure 5 graphically 
displays these results along the entire transcript, codon- 
by-codon (x-axis). Both G5 and G10 do not possess any 
sites under any selective pressure, while the remaining 
groups possess positive, negative (purifying) or both (see 
Table 1). As depicted in Figure 5, the majority of se- 
lected sites are negative (purifying) and groups G2 and 
G6 possess a few positively selected sites; however, there 
are more negatively selected sites for G2 and G6 than 
positively selected sites. In a recently published book, 
Eugene Koonin [61] explains that purifying (negative) se- 
lection, in some phases of evolution, is more common 
(orders of magnitude more common) than positive se- 
lection. Koonin considers purifying selection is the de- 
fault process of elimination of the unfit. With this regard 
we understand that Kunitz-domain proteins - taking 
into account the number of members in general - 
should undergo a massive purifying selection in order to 
shape (constrain) the molecular diversity of this tick SG 
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Table 1 Statistical analysis on nucleotide substitution 



SNAP 



Datamonkey 







d N 






d s 












Group 


Mean 


o 


Variance 


Mean 


o 


Variance 


a N /a s 


a N /a s 


Evidence 


***Selection type 


1 


0.56 


0.02 


0.006 


1.01 


0.18 


0.03 


0.56 


0.69 


No 


Negative 


2 


0.34 


0.06 


0.003 


0.56 


0.07 


0.005 


0.61 


1.03 


Yes 


Positive/negative 


4 


0.25 


0.03 


0.001 


0.17 


0.03 


0.001 


1.47 


1.75 


Yes 


Positive 


5 


0.31 


0.21 


0.045 


0.88 


0.37 


0.14 


0.35 


0.62 


No 


None 


6 


0.28 


0.04 


0.001 


0.36 


0.04 


0.002 


0.78 


0.94 


Yes 


Positive/negative 


8 


0.39 


0.08 


0.006 


0.7 


0.09 


0.008 


0.56 


0.72 


No 


Negative 


9 


0.17 


0.03 


0.001 


0.34 


0.05 


0.002 


0.5 


0.51 


No 


Negative 


10 


0.15 


0.03 


0.001 


0.35 


0.07 


0.005 


0.43 


0.53 


No 


None 



*The ratio between non-synonymous [d N ) and synonymous [d s ) nucleotide substitution per site analyzed by SNAP and by Datamonkey via SLAC [59]. 
^Evidence for positive selection (p-value < 0.05) analyzed by Datamonkey via PARRIS [60]. 
***Evidence for selection type for specific sites (p-value < 0.05) analyzed via SLAC. 



peptide family. Additionally, as noted by the reduced 
gene duplications events in platypus venom compared 
with the massive gene expansions reported in cone snails 
and spiders [62], selective pressures may fluctuate ac- 
cording to venom function. 

The evolution of /. ricinus SG Kunitz peptide groups 

The phylogeny of the I. ricinus Kunitz peptides from our 
study was reconstructed and divergence times were esti- 
mated. Maximum likelihood (ML) and Bayesian phylogen- 
etic methods resulted in similar tree topologies (Figure 6 
and Additional file 4A). Four main clades containing 9 out 
of 11 L ricinus Kunitz peptide groups were well supported 
in the ML (Additional file 4A: bootstrap values >72, marked 



in red) and Bayesian trees (Additional file 4B: posterior 
probabilities > 0.7, marked in red). The largest clade out of 
these four clades was composed of only Kunitz peptides of 
prostriate ticks, namely from I. scapularis and I. ricinus 
groups G2, G4-5, G8-10 that we characterize as ion chan- 
nel blockers/modulators. Group Gil also appeared in this 
clade and, regarding its transcript read profile (Figure 2C), 
Gil may also function as an ion channel blocker/modula- 
tor like the other Kunitz members of that clade. Although 
the maxiK channel modulator Ra-KLP from R. appendicu- 
latus did not group with the "ion channel blocker/modula- 
tor" clade, prostriate tick salivary proteins may completely 
function differently than metastriate tick proteins. Thus, 
our characterization of G2, G4-5, G8-11 as ion channel 
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Figure 5 Positively and negatively selected sites of /. ricinus Kunitz transcripts. Codon-by-codon (x-axis) of the nucleotide substitution 
selected sites (positive or negative; y-axis) were calculated by the Datamonkey server [58] via the SLAC algorithm [59]. The graph depicts the 
groups that had evidence {p-value < 0.05) for selection as indicated in Table 1. Of note, G3, G7 and G1 1 (and simple Kunitz) were not included 
since they contained only two sequences. 
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Prostriata clade: 
G2,G4,G5,G8,G9,G10,G11 
Ion channel blockers/modulators 



Prostriata/Argasidae 
clade: 

G3 

Anti-clotting inhibitors 



Argasidae clade: 
Anti-platelet 
inhibitors 




Prostriata clade: 
G6 
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Figure 6 Molecular clock analysis of /. ricinus Kunitz peptides. Kunitz nucleotide sequences of/, ricinus groups (G1-G1 1 and SK) and of ticks 
and spiders (defined by GenBank accession numbers) were used to infer divergence times within ticks by a Bayesian uncorrelated relaxed lognormal 
molecular clock model. Four taxon sets were calibrated according to Jeyaprakash and Hoy [63] and the nodes with the recovered divergence times 
were labeled: Araneae/Scorpions/Pycnogonida/Acari 459±18 MYA (node S), Argasidae 214±28 MYA (node A), Prostriata 196±27 MYA (node P) and 
Metastriata 134±22 MYA (node M). The figure shows a maximum clade credibility tree including the mean divergence times (MYA) at each node. 
Well supported nodes and matching ML and Bayesian analyses (bootstrap values >70%, posterior probabilities >0.7, see Additional file 4) are indicated 
with black circles. Nodes with gray circles appeared in the ML and Bayesian trees, but they were not well supported (bootstrap values <70% and 
posterior probabilities <0.7). Black squares indicate well supported nodes by the Bayesian analysis (posterior probabilities >0.7), but were not resolved 
in the ML tree. Gray squares indicate well supported Bayesian tree nodes (posterior probabilities >0.7) that were less supported in the ML analysis 
(bootstrap values <70%). Non-labeled nodes were not well supported in both analyses (bootstrap values <70% and posterior probabilities <0.7) and 
they did not match in both phylogenetic reconstructions. The four best-supported clades (supported well by both analytical methods) are labeled with 
red circles at the nodes. The red star at the node highlights a clade containing G8 and G1 1 members at a divergence time of 71 MYA that is similar to 
the divergence time of G6. The scale bar in MYA is at the bottom center of the tree. 



blockers/modulators may still be valid and future experi- 
mental verifications will clarify this uncertainty. 

The second largest clade also contained only prostriate 
Kunitz peptides composed mainly of L ricinus group G6. 
As in the ion channel blocker/modulator clade, no func- 
tionally described Kunitz appeared in this clade, therefore 
any possible function remains unknown. Functionally 



described Kunitz peptides were present in the remaining 
two largest clades. One of these clades was made up of 
only argasid Kunitz peptides that included anti-platelet in- 
hibitors. This group seem to evolved independently from 
hard ticks as previously postulated by Mans et al. [42], 
possibly due to different blood feeding behavior compared 
with hard ticks. The other clade mainly contained /. 
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scapularis peptides together with /. ricinus group G3 
and anti-clotting inhibitors of Ornithodoros spp.. Our 
aa sequence and structural analyses of G3 did not re- 
veal any putative function for this /. ricinus group, al- 
though the transcript profile of G3 was similar to 
that of ion channel blockers/modulators (as Gil in 
Figure 2C). However, since G3 clusters with anti-clotting 
inhibitors and not with Gil may suggest a different func- 
tion during tick feeding. I. ricinus groups Gl and G7 clus- 
tered together with hard and soft tick Kunitz peptides, but 
their relationships are uncertain since they were not well 
supported in the ML and Bayesian trees (Additional file 
4). Nevertheless, in our Bayesian analysis, /. ricinus groups 
Gl and G7 appeared with functionally described serine 
protease inhibitors and G7 proteins were also grouped 
together with Gl proteins in our ML analysis. Both our 
phylogenetic and structural analyses of the I. ricinus 
groups Gl and G7, suggests that these two groups may 
act as serine protease inhibitors. SK peptides appeared 
throughout the phylogram. Some SKs were similar in their 
aa sequence (Additional file 5) with the other I. ricinus 
Kunitz peptides assuming a similar function during blood 
feeding. Others SKs may be more functionally distinct, 
such as Ir2-24967-SK, since it differs in its aa sequence 
from all other /. ricinus Kunitz peptides. 

Our molecular clock analysis of tick and spider Kunitz 
peptides recovered similar divergence times for the main 
splits of 1) spiders from ticks (426 MYA, 95% HPD: 389- 
462 MYA, node S), 2) soft ticks from hard ticks (233 MYA, 
95% HPD: 198-266 MYA, node A) and 3) prostriate ticks 
within Ixodidae (221 MYA, 95% HPD: 181-254 MYA, node 
P) as calibrated for these taxon sets; the tree root was esti- 
mated as 436 MYA (95% HPD: 395-479 MYA) (Figure 6, 
Additional file 4B). Most of these splits were well supported 
in the chronogram. Metastriate Kunitz peptides (fourth 
taxon set), however, started to evolve earlier than calibrated. 
Nevertheless, this split is not well supported in the chrono- 
gram as previously stated (193 MYA, 85% HPD: 160-228 
MYA, Figure 3, node M). Although L scapularis homologs 
of L ricinus group Gl Kunitz peptides are considered to be 
the ancestral form of all Kunitz-domain proteins [23], our 
phylogenetic reconstructions of L ricinus Kunitz peptides 
cannot confirm these findings. Even though Gl is not well 
supported in the ML and Bayesian analyses, it can be as- 
sumed that this group of proteins evolved convergently in 
ticks as both phylogenetic trees indicate (Additional file 4). 
Expansion of protein families due to gene duplication 
events has been previously reported for ticks [23,64] and 
can explain the convergent evolution of Gl. Overall, the L 
ricinus Kunitz peptides in our molecular clock analysis 
underlie recent gene duplication events that explains how 
several members of the I. ricinus groups and other tick 
Kunitz peptides do not strictly follow the calibrated diver- 
gence times within the taxon sets in the phylogenetic trees. 



Target-oriented evolution, a model for /. ricinus SG Kunitz 
peptide evolution 

The Red Queen hypothesis [57] explains that host- 
parasites interactions must be characterized by constant 
mutations in both systems in order for parasites to effi- 
ciently infect the host and for the host to survive or 
avoid the parasitic attack. We understand the evolution 
of L ricinus salivary Kunitz family in this dynamic frame- 
work. In the following paragraphs we explain our basis 
to propose one model of target-oriented evolution for 
this family of proteins and briefly discuss our findings 
regarding the evolution of venomous proteins. 

Our model considers G6, compared with all other /. 
ricinus Kunitz peptide groups, as a cornerstone in un- 
derstanding the Kunitz family evolution of I. ricinus. 
Firstly, members of G6 show unique molecular, struc- 
tural and evolutive properties by possessing the highest 
amount of intra-Cys residues between Cys2 and Cys4 
(Figure 1), a distinct Alumina read profile (Figure 2C), a 
high aa variability (Figure 3 A and D), a higher number 
of different secondary structures (Figure 3B), and an 
increased number of negatively selected sites (Figure 5 
and Table 1). Secondly, in the phylogenetic analysis, the 
clade formed by G6 is mainly composed of I. ricinus 
Kunitz peptides (Figure 6 and Additional file 4), also 
representing the largest monophyletic clade suggesting a 
recent and specific evolution for this group in /. ricinus. 
Thirdly, based on the molecular clock analysis G6 is the 
fastest evolving group, diverging about 70 MYA ago and 
having the highest number of members. Taking into 
account the time of divergence and the number of mem- 
bers, we should also consider G6 as an example of pos- 
sessing an accelerated rate of evolution that has been 
reported for several other families of venomous proteins 
[5]. At this point we consider it important to make a dis- 
tinction. As depicted in the phylogenetic tree (Figure 6) 
members of other groups appear to evolve around the 
same time point as G6. It is crucial to note here that 
even when some genes show the same time of evolutive 
divergence, as they belong to a distinct group, they do 
have different functional states at the moment of diver- 
gence. For example, our tree depicts a clade containing 
members of G8 and Gil that diverged 71 MYA ago 
(Figure 6, marked with a red star), but, as we previously 
showed (Figures 1, 2, 3B, 4A-B), these groups have dif- 
ferent properties compared with G6. In this way we 
consider that molecular mechanisms working for the ob- 
served Kunitz family expansion (e.g., gene duplication) 
must be operating on the whole genome background, 
thus, functionally and structurally different gene(s) may 
show genetic amplification at similar evolutive times. 
But we consider that the genetic expansion must be 
occurring at different rates for different groups of 
genes, depending on the function of the gene and its 
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importance in the interaction with the host - as previ- 
ously reported for venomous protein families [62] . 

We consider that G6 is an important proof that tick saliv- 
ary Kunitz peptides possess a target-orientated evolution. 
By target-orientated evolution we mean the molecular 
properties that arise in the parasitic proteins through its in- 
teractions with the host targets and the host immune 
system that eventually determine its structural state. Our 
model (Figure 7) takes into consideration that different 
groups are undergoing different evolutive pressures (Table 
1). This may explain, from a structural point of view, the 
evolution of the various functions found in the Kunitz fam- 
ily. In the model, we propose three main evolutionary cat- 
egories: "working", "short-term" and "long-term" (Figure 7). 
The "working" group (i.e., G6) contains an increased num- 
ber of highly variable members and will eventually reduce 
its number of members and gain in molecular specialization 
due to immunological and functional constrictions. Thus, 
becoming a variable but less flexible, "short-term" evolving 
group (i.e., Gl, G2, G8 and G9). The high variability found 
in Gl and G2 compared with G6 may be interpreted dif- 
ferently. Non-monophyletic Kunitz groups, like Gl and 
G2 may be remnants of variable-monophyletic groups, 
like G6, that are evolutionarily static in expressed traits 
while accumulating genetic changes [65] . Finally, evolution 
will lead to a "long-term" evolved group (i.e., G3, G4, G5, 
G7, G10 and Gil) that is less variable and highly specific 
in their molecular functions. Immune and functional pres- 
sures will remain during the whole process of both differ- 
entiation and maintenance of a gained specificity. As we 
showed in Table 1, /. ricinus SG Kunitz groups are evolv- 
ing under different selective pressures and we predict they 



may have different functions (Figure 1 and Figure 2). 
Genes with different functions in venomous proteins fam- 
ilies have been shown to evolve under different selective 
pressures [5] and thus at a different rate of evolution. In 
this sense we do not claim the three categories of "work- 
ing", "short-term" and "long-term" as relatively exclusive 
to time, but we also refer them to a functional state (see 
above the example of G6 in relation to G8 and Gil). 

Conclusions 

The Red Queen hypothesis is an arms race proposed as 
a model for co-evolution for host-parasite interactions 
[57]. Todays increasing knowledge about the interface 
between host-tick interactions has provided insight that 
ticks successfully feed to repletion by counteracting the 
host immune response, platelet aggregation, inflamma- 
tion and vasoconstriction [4,64,66,67]. We therefore 
infer that two main evolutionary pressures may be driv- 
ing the evolution in I. ricinus Kunitz peptides - i.e., im- 
mune response and the necessity of functional diversity 
against the diverse host molecular targets. The former is 
in agreement with specific immune response against tick 
salivary proteins [68], and, specifically against Kunitz 
members [69]. The pressure for functional diversity on 
tick Kunitz peptides may be that several targets have 
been associated with their inhibitory activity such as 
trypsin, elastase, kallikrein [70], chymotrypsin [71], tissue 
factor complex inhibitor [38] and ion channels [40]. We 
consider our model of target-oriented evolution (Figure 7) 
to explain the evolutionary dynamics found in the I. rici- 
nus SG Kunitz peptides in the framework of the Red 
Queen hypothesis. 
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Immune pressure 

Functional constriction 
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Immune pressure 
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'Evolution of members of the group 




Evolution of members of the group 




Properties of members: 

1- High functional plasticity because of: 

a. High aa variability 

b. Highly disordered 

c. High heterogeneous secondary structures 

d. Structural flexibility (misplacement of second Cys) 

e. Extra aa positions for purifying selection 

2- Immune recognition susceptibility 
a. High antigenicity 



Properties of members: 

1- Low functional plasticity because of: 

a. High aa variability 

b. Highly disordered 

c. Loss heterogeneity in secondary structure 

d. Structual flexibility (established pattern of Cys) 

e. No extra aa. Cleaned by purifying selection 

2- Less immune recognition susceptibility 
a. Medium antigenicity 



Properties of members: 

1- No functional plasticity because of: 

a. No or very low aa variability 

b. Middle disordered 

c. Loss heterogeneity in secondary structure 

d. No structural flexibility (established pattern of Cys) 

e. No extra aa. Cleaned by purifying selection 

2- Low immune recognition susceptibility 
a. Mechanism different than amino acid variability 

for immune escape. 

Figure 7 Proposed mechanism for target-oriented evolution. We infer from this model that aa variability, structural changes and protein 
disorder may have different purposes in what we label as working, short-term and long-term evolved groups of tick salivary Kunitz peptides. For 
working evolved groups we propose that they possess high functional plasticity, but are highly susceptible to immune recognition. In short-term 
evolved groups they possess low functional plasticity and less immune recognition susceptibility. Long-term evolved groups have extremely low 
functional plasticity and possess low immune recognition susceptibility and may possess immune escape mechanisms other than 
antigenic variability. 
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Methods 

Identifying single Kunitz-domain peptides from the 

/. ricinus transcriptome and peptide sequence alignments 

We chose all full-length, secreted and non-truncated sin- 
gle Kunitz-domain peptide sequences from the SG /. rici- 
nus transcriptome [18]. We then submitted all sequences 
to the Pfam server [72] to verify and identify them as sin- 
gle Kunitz-domain peptides. To further verify that these 
sequences are classical Kunitz-domain peptides we per- 
formed a protein BLAST alignment using BPTI (GenBank 
accession no.: P00974) as the query. As a final filtering 
point we submitted the remaining sequences to SignalP 
4.0 server [73], since we were only interested in secreted 
peptides. All signal peptides were removed from all se- 
quences and this resulted in a total of about 200 mature, 
single Kunitz-domain peptide sequences. 

Only sequences that possessed a minimum of 6 Cys resi- 
dues and a maximum of 7 Cys residues were used for our 
analysis. The majority of functionally described tick Kunitz 
peptides possess 6-7 Cys and usually >7 Cys are proteins 
with more than a single Kunitz-domain - with a few doc- 
umented exceptions [26,40] . We specifically wanted to in- 
vestigate the archetypal Kunitz-domain peptides of /. 
ricinus (hence, those possessing more or less 6 Cys resi- 
dues). All peptides were divided into groups based on their 
Cys motif (i.e., the number intra-Cys residues) and a group 
was defined as whether there were two or more represen- 
tative sequences for each motif. We chose a minimum of 
two sequences as a criterion since G3 has been previously 
reported by Dai et al. [23] and we only found two se- 
quences for this group in the I. ricinus salivary transcrip- 
tome. This grouping resulted in a total of 145 sequences 
that formed eleven groups (Gl-Gll). We additionally 
found several Kunitz-domain peptides (n = 7) in the /. rici- 
nus salivary transcriptome that possessed less than 6 Cys 
residues that were included in our phylogenetic analysis. 
These small peptides were defined as simple Kunitz' (SK) 
due their atypical Cys motif compared to archetypal 
Kunitz-domain peptides. Due to limitations of dealing 
with extremely small peptides and variations on their Cys 
motif, these SKs were used in only a few of our analyses. 
Thus, a total of 152 I. ricinus salivary Kunitz peptide se- 
quences were used in our subsequent analyses. 

Tertiary protein modeling and disulfide bridge 
predictions 

To predict the disulfide bond patterns we first modeled a 
representative of each group (those indicated in Figure 1) 
using the evolutionary protein model building web-server 
Phyre2 [74]. Using the Schrodingers Maestro software 
[75], each predicted 3D model was minimized, prepared, 
refined and re- minimized - minimization was used to 
remove steric clashes. Minimization uses a Truncated 
Newton algorithm [76], an OPLS-AA force field and a 



surface generalized Born (SGB) implicit solvent [77]. The 
total CPU time for minimizing was 2-3 min. After 
minimization, Cys resides were connected, hydrogen 
atoms were added, and the overall hydrogen-bond net- 
work was optimized using the Schrodinger s Protein Prep- 
aration Wizard [78], which optimizes the entire hydrogen 
bond network by means of side chain sampling. The Pro- 
tein Preparation Wizard analyzes the structure then builds 
a cluster of hydrogen bonds and 100000 Monte Carlo ori- 
entations for each cluster. The algorithm determines the 
quality of the hydrogen-bond network to produce an opti- 
mized structure. The modeled structures were then re- 
minimized. The disulfide bond patterns were verified 
using the web-server BRIDGED [79,80]. 

Amino acid (aa) variability (Shannon entropy) and 
secondary structure analysis 

Shannon entropy analysis [51] was used to calculate the 
aa variability. The Shannon entropy (H) was calculated 
for every position with the following equation: 

M 

H = -Y J PiVi 

i=l 

Pi is the amount of each aa type (/), and M is the number 
of aa types (maximum 20 aa). When H >2.0 the site is 
considered variable, while H <1.0 count for highly con- 
served positions [81]. Secondary structure was predicted 
using the position-specific scoring matrices method [82] 
from the PSIPRED server [82,83] . The presence (or not) of 
p-strands and a-helices, their length, amount, and position 
in the sequences were used as criteria for 2D model 
differentiation. 

Nucleotide substitution rate analysis (SNAP and 
Datamonkey server) 

We first performed a codon-based alignment using the 
GUIDANCE server [84] (MAFFT, Localpair, maxiterate 
1000 with 100 bootstraps) to analyze the nucleotide sub- 
stitution of the Kunitz encoding transcripts. We then 
submitted this codon-based alignment to the SNAP ser- 
ver that employs the Nei and Gojobori method [85]. Nei 
and Gojobori used the following equations to denote the 
type of substitution based on the assumption that a mu- 
tation in the second nucleotide position of any codon 
will cause a non-synonymous substitution, but muta- 
tions at the first or third position may cause either a 
non- synonymous or a synonymous substitution: 

i=l i=l 

Where s t and n t are the respective sites for synonym- 
ous (S) and non-synonymous (A/) substitution for the ith 
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codon with the total number of codons for the se- 
quences studied (r). Nei and Gojobori [85] also used the 
following formulas to estimate the synonymous (d s ) and 
the non-synonymous (d N ) substitutions per site by ap- 
plying the Jukes-Cantor formula [86]: 

d s = - 3 / 4 ln(l- 4 / 3 Ps) d N = - 3 / 4 ln(i- y 3PN ) 

We then used d s and d N to identify whether a group has 
undergone positive (>1) or purifying selection (<1); where 
p s and p N are the synonymous differences per synonym- 
ous site and non-synonymous differences per non- 
synonymous site, respectively. Standard deviation (a) and 
variance were also calculated by the SNAP server using 
the methods described by Nei and Gojobori [85]. 

Under the Datamonkey server [58] we used the algo- 
rithm PARRIS [60] to detect positive selection and the 
SLAC algorithm [59] was used to calculate the ratio d N ld s . 
A two-tailed extended binomial distribution set at p-value 
< 0.05 was used to assess significance of both algorithms. 
The SLAC and PARRIS algorithms use a neighbor-joining 
tree with a maximum likelihood for branch lengths and 
substitution rates. A few of our codon alignments con- 
tained multiple segments (i.e., recombination) as analyzed 
by GARD [87], therefore the base frequencies and substitu- 
tion rates obtained from GARD were inferred together 
from the entire alignment, while branch lengths were fitted 
to each segment separately. Based on these analyses a glo- 
bal d N /d s ratio was obtained. 

The SLAC algorithm was also used to detect which 
nucleotide substitution site were positively or negatively 
selected. For each synonymous and non-synonymous nu- 
cleotide substitution site, four measurements were made: 
normalized expected (ES and EN) and observed numbers 
(NS and NN). The SLAC algorithm then calculated: 

dN = NN/EN dS = NS/ES 

If dN < dS a codon was negatively selected and if dN > 
dS a codon was positively selected. 

Phylogenetic analysis of Kunitz-domain peptides from 
hematophagous arthropods 

All 152 /. ricinus Kunitz peptide sequences were used 
for a phylogenetic reconstruction of the Kunitz-domain 
family within ticks. Accordingly, we searched the NCBI 
database for Kunitz sequences from hard and soft ticks, 
and additionally spider sequences were included as out- 
group. Similar to the I. ricinus Monolaris, only secreted 
and full-length Kunitz-domain proteins with up to 7 Cys 
residues in their aa sequence were chosen. All candidate 
proteins were re-confirmed as single Kunitz-domain 
peptides using the Pfam server [72]. Additionally, se- 
quences of single Kunitz-domain proteins that have been 



functionally described were also included into the phylo- 
genetic analysis. 

A codon alignment was constructed using TranslatorX 
[88] and then all mature 249 single Kunitz-domain protein 
sequences were aligned using the program MAFFT v7 
[89]. For the multiple sequence alignment we applied an 
iterative refinement method (L-INS-I) with the BLOSUM 
62 matrix (New scoring scheme, gap opening penalty: 2.0, 
Offset value: 0.5). The protein alignment was then used in 
TranslatorX to align the corresponding Kunitz nucleotide 
sequences of ticks and spiders to create the final codon 
alignment. All unaligned, flanking sequence regions of the 
codon alignment were trimmed and all gaps were closed 
apart from gaps that were introduced by up to two se- 
quences only (final sequence lengths: 126 bp, Additional 
file 5). The final alignment was analyzed using jmodeltest 
v2.1.2 in order to select the best- fit nucleotide substitution 
model for the phylogenetic reconstruction [90]. A phylo- 
genetic tree was constructed under the maximum likeli- 
hood (ML) optimality criterion and the GTR model with a 
gamma distribution of substitution rate and estimated 
base frequencies using the program RAxML (HPC v7.2.6) 
[91]. Tree searching and bootstrapping were performed 
simultaneously (-f a, 1000 bootstrap replicates). 

Divergence time estimation 

The codon alignment of coding single Kunitz-domain 
proteins was also used to estimate divergence times 
among ticks using BEAST vl.7.5 [92]. For this Bayesian 
analysis, the following four taxon sets and divergence 
times (unit: million years ago, MYA) were calibrated ac- 
cording to Jeyaprakash and Hoy [63] using normal prior 
settings: Araneae/Scorpions/Pycnogonida/Acari 459 ± 
18 MYA (included taxa: all tick sequences), Argasidae 
214 ± 28 MYA (included taxa: all argasid sequences), 
Prostriata 196 ± 27 MYA (included taxa: all prostriate 
sequences) and Metastriata (included taxa: all metastri- 
ate sequences) 134 ± 22 MYA. All tick Kunitz-domain 
sequences in the Araneae/Scorpions/Pycnogonida/Acari 
taxon set were enforced to be kept monophyletic. Default 
settings were used for all other prior and operator settings 
in all analyses. Similar to the ML analysis, a GTR substitu- 
tion model with a gamma rate distribution across sites 
and estimated base frequencies were applied. Two inde- 
pendent Beast runs with four independent Markov-chain 
Monte Carlo (MCMC) chains and 100,000,000 genera- 
tions were performed. The starting trees were randomly 
generated and all trees were sampled every 10,000 genera- 
tions. Different molecular clock (uncorrelated exponential 
and lognormal relaxed clock) [93] and speciation models 
[94] (Yule process and birth-death process) were tested. A 
uniform prior distribution for the calculations of the mean 
branch rates under the uncorrelated lognormal or expo- 
nential relaxed molecular clock was set. All BEAST 
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analyses were assessed if each parameter converges on the 
same posterior distribution in the MCMC runs and if an 
effective sample size was reached using Tracer vl.5 (http:// 
tree.bio.ed.ac.uk/software/tracer). Additionally, the mar- 
ginal likelihoods were estimated using path sampling and 
step stone sampling methods of each analysis in order to 
select the appropriate model for our analysis [95]. Accord- 
ing to our evaluation of the different BEAST runs, the un- 
corrected relaxed lognormal molecular clock model was 
chosen with the Yule tree prior for speciation. The two tree 
files from the latter BEAST analyses were combined using 
LogCombiner vl.7.5 from the BEAST software package 
using a burn-in of 10% of all sampled trees was set. Finally, 
TreeAnnotator vl.7.5 was used to summarize all trees in 
order to obtain a single maximum clade credibility tree in- 
cluding mean node heights. 

Additional files 



sequences and from other tick and spider sequences was constructed using 
TranslatorX [88], Firstly, all mature 249 single Kunitz sequences were aligned 
using the program MAF^ v7 [89]. Secondly, the protein alignment was 
used in TranslatorX to align the corresponding Kunitz nucleotide sequences 
of ticks and spiders to create the final codon alignment. All unaligned, 
flanking sequence regions of the codon alignment were trimmed and all 
gaps were closed apart from gaps that were introduced by up to two 
sequences only (final sequence lengths: 126 bp). 
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